Datasets, parsers and benchmarks #733

ThomSerg · 2025-09-15T06:08:46Z

With all of the work on datasets / benchmarking I've been doing lately, I thought of cleaning up the code and creating some reusable structures to limit code duplication and ease the creation of new datasets / benchmarks.

This pull request is definitely not in a state to be merged, but opening it anyway as a starting point for discussion / to receive feedback. So most things in here are just ideas, to start the conversation on more generic datasets / parsers / (competition) benchmarks. I'm not at all "attached" to any of the code. Just tried to get something to work first, now we can start discussion how it should be done "properly".

cpmpy.tools.dataset

A new dataset module is introduced (cpmpy.tools.dataset) as a central place to collect ... datasets. I could have placed the code directly in here, but as discussed multiple times internally, we have multiple different concepts of datasets. Below a sketch:

We have

"model"-datasets, files which directly describe a CP model, e.g. XCSP3
"problem"-datasets, files which describe data to be used as input for generating a CP model using a "model generator", e.g. psplib

Due to this distinction, I also put the "model"-datasets inside a "model" subdirectory.

3 "model" datasets have been added:

MaxSat Eval (mse)
XCSP3
PB competition

(I have a version of PSPLIB, but this one actually belongs to the "problem"-dataset category.)

Each dataset subclasses the generic _Dataset, which implements logic that should be shared across all datasets and which provides dataset-specific methods to be overwritten. Mostly, each dataset defines its arguments for the constructor (e.g. year, track, ...) and a download method. So quite easy to add new datasets.

Parsers

For each of the datasets, respective parsers have been added to the tools:

MSE: from cpmpy.tools.wcnf import read_wcnf
XCSP3: from cpmp.tools.xcsp3 import read_xcsp3
PB: from cpmpy.tools.opb import read_opb

You'll notice some differences in the names, due to data formats being more generic than datasets, e.g. MSE is formulated in the more generic WCNF format.

cpmpt.tools.benchmark

Whilst running experiments, I've collected many variants of the XCSP3 benchmark runner adapted to the other datasets. So I thought, why not do the same exercise here? This one is still the most uncertain on how it should best be done.

So next to data formats and datasets, we also have "formalised" benchmarks. They're decoupled from both the parser and the dataset. For example, the PB competition. It defines an input format, an output format, and rules on how to "behave" (e.g. how to handle a sigterm). The WCNF parser covers the input part, so that one gets reused. The OPB dataset covers instances to test on, but any other dataset in the WCNF format can also be used within the rules of the PB competition. All the other competition rules get captured in this new "benchmark" object. I again provided a more generic Benchmark to be subclassed, but in this case it is also usable on its own:

from cpmp.tools.wcnf import read_wcnf    # your custom model parser or one included in CPMpy
bm = Benchmark(reader=read_wcnf)
bm.run(
    instance="example.wcnf",     # your benchmark instance (e.g. coming from a CPMpy model dataset)
    solver="ortools",
    time_limit=30,
    mem_limit=1024,
    verbose=True
)

Simply provide a callable parser and a path to an instance, and the model will be create and solved with the niceties of us handling everything (memlimits, timeouts, printing, capturing results, ...). Many more arguments are available (like with the xcsp3 competition):

def run(
        self,
        instance:str,                                          # path to the instance to run
        open:Optional[callable] = None,         # how to 'open' the instance file
        seed: Optional[int] = None,                 # random seed
        time_limit: Optional[int] = None,         # time limit for this single instance
        mem_limit: Optional[int] = None,        # MiB: 1024 * 1024 bytes
        cores: int = 1,                         
        solver: str = None,                                # which backend solver to use
        time_buffer: int = 0,               
        intermediate: bool = False,             
        verbose: bool = False,
        **kwargs,     
    ):

But to follow the PB-competition specific rules, we also have the pre-made OPBBenchmark subclass which customises Benchmark to the rules of the PB-Competition. Since a lot of the Benchmark's behavior has been compartmentalized into different methods, any subclass can easily overwrite these to customise according to the competition rules (e.g. how to format the result, how to report on intermediate results, how to handle sigterms, ...). This subclassing of Benchmark allows for the creation of many competition runners with as little duplicate code as possible.

That's about it. A lot of code that should probably become separate pull requests after we figure out what to do with it.

(tools.xcsp3 still contains a lot of code from before I attempted to bring things together, e.g. its still has its own dataset / benchmark runner)

…nto benchmark_datasets

ThomSerg added 27 commits September 11, 2025 17:49

WCNF parser

feead09

Small docstring change

5ade48e

OPB parser

7f52f5f

Move parser out of init and add cli

548de8e

Add MSE and OPB datasets

4505025

Rename datasets to dataset

2b26034

Dataset specific 'open'

e238c29

Dataset module init file

669875a

Add benchmark runners

c1bd2fe

Formatting

83454e0

XCSP3 as dataset and benchmark

7f2d363

Parsers with changeable 'open'

9173c9f

Type-hints and docstrings

52b95de

Add TODOs

bf5ecd2

Mising helper functions

5dc3886

Print stacktrace of process

7209c62

Fix arguments

f66c8c5

Fix overwritten open

6ab8b32

Read as string instead of StringIO

34c8a9e

Read as text instead of binary

fd55b3a

Sigterm callbacks

2be9fa6

Attempt at fixing some nested memory exceptions

2e64623

Overwritable exit status

5b92680

Validate dataset arguments

8fff254

Check non-empty dataset

2b4a8f0

Add feedback finished downloading

b68144d

Small fixes

b08df43

ThomSerg added the discussion label Sep 15, 2025

ThomSerg added 2 commits October 10, 2025 13:39

Fix intermediate solutions and time tracking

431b065

Increase intermediate solution time resolution

7d98c35

ThomSerg and others added 19 commits October 10, 2025 15:26

Missing default return argument

4664051

Only import "resource" when supported

582fc96

remove var x0 which is not used in opb

2eea41c

rcpsp dataset and benchmark

6111fc4

opb fix intermediate solutions

af36c87

update docstrings

a834387

Fix more docstring

8805cad

Add JSPLib dataset and benchmark

ce6b6bc

Add bounds for all jsplib instances

9098299

Fix choco args

658967d

Merge branch 'benchmark_datasets' of https://github.com/CPMpy/cpmpy i…

eb41634

…nto benchmark_datasets

Fixes

38db290

Merge remote-tracking branch 'origin/master' into benchmark_datasets

41e3768

correct jsplib output file name

62b605d

remove matplotlib import

ddf6938

xcsp3 track intermediate sol time

344aaaf

opb print intermediate solutions

7cd1bb1

mse print intermediate solutions

a21a040

cplex and hexaly solver arguments

eda839c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Datasets, parsers and benchmarks #733

Datasets, parsers and benchmarks #733

Uh oh!

ThomSerg commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Datasets, parsers and benchmarks #733

Are you sure you want to change the base?

Datasets, parsers and benchmarks #733

Uh oh!

Conversation

ThomSerg commented Sep 15, 2025

cpmpy.tools.dataset

Parsers

cpmpt.tools.benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants